Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overloads #11

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
Open

Overloads #11

wants to merge 15 commits into from

Conversation

jchitel
Copy link
Owner

@jchitel jchitel commented Mar 1, 2018

This pull request adds function and generic type overloads to Ren.

To facilitate this unexpectedly complex feature, the entire compiler stack was rewritten according to pure functional principles. The lexer, parser, type checker, and translator are now all purely functional, making use of enforced immutability, avoiding classes, and taking advantage of TypeScript's discriminated union types instead of inheritance.

All of this made it much easier to reason about the compiler logic, and made it ultimately much easier to implement overloads, which were difficult to support in the prior type system. The type system has been completely rewritten as well to make more use of flags, enums, and discriminated unions.

TODO:

  • Rewrite lexer to be pure
  • Rewrite parser and syntax to be pure
  • Rewrite type checker and visitors to be pure and more flexible
  • Add overload support to type system (partially done)
  • Rewrite translator to be pure
  • Add tests for all of the above

This involves quite a few changes:

Parser:
The parser has been completely redone from the ground up.
To facilitate the new compiler restriction of pure functional programming,
the parser logic makes use of function composition. The result is a
dead-simple API. There are 5 types of parse expressions: tok (tokens),
seq (sequences), select (selections), optional, and repeat. Each one
has a corresponding function that can be used to build a parse function
capable of parsing any type of node. The whole thing is now strongly
typed. Sequences are strongly typed by way of overloads, and they are
parsed as arrays instead of objects, with a transform function for
converting the arrays to objects. The concept of 'soft' and 'definite'
are now gone, because that was overkill. The parser will now greedily
consume as much input as possible, and only throw an error when all
options are exhausted. This means that order needs to be heavily
enforced even more than before. The parser also makes use of the new
pure lexer as well, so the entire process is purely functional.
Additionally, much of the complex logic around the various types of
repetitions and left recursions have been replaced with desugaring
to simpler constructs, making the full set of logic simpler as a whole.
"Abstract" node types are explained in the Syntax Environment section.

Syntax:
The syntax has now scrapped the use of classes, and now uses interfaces
with discriminated unions. The base type of all syntax node types is now
NodeBase, which defines only a 'location' property. All sub-interfaces
are required to specify a 'syntaxType' property specifying a single
specific SyntaxType enum value. This property is the discriminant,
which we will make use of in the future. This whole structure is made
possible by the extreme flexibility of the parser, which can now return
any kind of value, not just class instances. The high-level node types
(declarations, types, expressions, and statments) are now just unions
of their corresponding node types, and there is one high-level Node
type that is the union of all of them. The Program type is now called
ModuleRoot, and ModuleRoot and all import and export types are now
separated from other declarations, because they are part of their own
domain.

Syntax Environment:
The new parser API is specified in-place, not using functions.
This means that node types that reference each other circularly
will not work out of the box. We need to make use of mechanisms such
as scope hoisting and referencing undeclared variables within functions
to make it work properly. The only problem with that is that these
mechanisms do not work cross-module. To make this work, we have
introduced the concept of a "syntax environment", which is a function
that loads any circularly-referencing syntax types on-demand. All
of the high-level node types have their parser specification declared
within the syntax environment inside functions. All types that are
dependent on these do not declare their syntax at the module root,
instead declaring them in "register()" functions that declare their
dependencies as parameters. The syntax environment's module imports
all of these register functions and calls them within the environment
function, where it has access to the high-level parse functions.
This means that to have access to the parse functions of all syntax
types, you need to call the SyntaxEnvironment function, which will
return a fresh environment complete with circular references resolved.
@jchitel jchitel self-assigned this Mar 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants